Assignment #1 - Colorizing the Prokudin-Gorskii Photo Collection

Adithya Praveen

I. Project Introduction

Overview

In this homework, I explain my experiments and results in colorizing the Prokudin-Gorskii Photo Collection!

Approach

Implemented two algorithms for image alignment:

  • The slower single scale method
  • The much faster multi-scale (image-pyramid) alignment.
  • Additionally, I tried image alignment using raw pixel values and edge values as features - in combination with techniques like adaptive border cropping. I go into a lot more detail about my assumptions and experiment settings in the Results and Bells and Whistles sections.

    Challenges

    In no particular order, I would say the more challenging aspects of this homework were:

  • Finding a good set of hyper-parameters that worked for all images using my implementation of the multi-scale image alignment algorithm.
  • Getting emir.tif to align properly, and experimenting with different image features that facilitate better alignment.
  • Restraining myself to only a few new additional images from the Gorskii Collection to test my algorithm - there's so many good ones!!!
  • II. Results

    Alignment using Raw Pixels

    Before displaying the results of my best algorithm, let's take a look at the results of my implementation of the multiscale image alignment using just raw pixel intensities as features for alignment.

    A few notes:

  • Border Crop justification: Since I'm using np.roll, there's no change in channel shape during enrollment - said differently, I don't crop off the rows / cols of pixels that don't align - they simply roll over to the other end (horizontally, or vertically). Therefore, to make alignment work, I use a thresholding mechanism (as described in 3. Automatic Border Cropping) to remove borders, and minimize artifacts that result from the alignment process.
  • Image pyramid hyper-parameter choice: I chose to go with three pyramids (scaling down by a factor of 2 at each level), with a row displacement in the range [-50, 50], and a column displacement in the range [-20, 20]. I wanted to cut down on runtime, and found out that reducing the column displacement range did not have significant effect on the alignment quality of the given images. These hyper-parameters allow me to align each image in about 60 seconds.
  • SSD over NCC: I chose to go with SSD, since it was faster to compute compared to NCC - and personally, both SSD and NCC gave me the exact same alignment offsets for both red and green channels. Therefore, I proceeded with SSD and all images below are displayed using this metric.
  • Conclusion: The results looks decent, except that the Emir of Bukhara would be rolling in his grave if he saw his aligned image :( - checkout 1. Better Features - Canny Edge where we fix this issue!

    Note: The values in the red and green boxes, denote the alignment offsets for red and green channels respectively.

    Low Res Images

    (-4, 3) (-3, 2)
    Cathedral

    High Res Images

    (-4, 30) (-12, 32)
    Harvesters
    (100, 46) (46, 32)
    Icon
    (-46, 20) (-20, 16)
    Lady
    (-48, 74) (-46, 52)
    Self Portrait
    (-32, -232) (-36, 48)
    Emir of Bukhara
    (206, 18) (100, 30)
    Three Generations
    (128, 64) (64, 12)
    Train
    (-18, 60) (-16, 36)
    Turkmen
    (-2, 46) (-12, 20)
    Village

    Prokudin-Gorskii - Extended Cut

    Going through the LoC's gallery of Gorskii's digitized collection was a lot of fun! I particularly enjoyed colorizing photos that involved people in different traditional attires. Here are my top picks. The first three are low-resolution, and the remaining are high-resolution images.

    Low Res Images

    (-2, -3) (-2, -2)
    Island of Capri
    (-12, -3) (-6, -1)
    Lilacs
    (2, -1) (-2, -1)
    Lake Saimaa

    High Res Images

    (4, 78) (-16, 48)
    Armenian Woman
    (32, -50) (-12, -12)
    Church of Transfiguration
    (0, 18) (-78, 14)
    Dagestani Types
    (-18, 32) (-16, 32)
    Fruit Stand
    (-64, 68) (-32, 38)
    In Italy
    (-32, -50) (-68, -30)
    Railroad workers

    III. Bells & Whistles

    1. Better Features: Canny Edge

    We improve upon the image alignments in Alignment using Raw Pixels by applying the canny edge filter [cv2.Canny()] at each level of the pyramid. Alignment is then done with the canny edge maps instead of the raw RGB channels. Performing multi-scale image alignment using these edge features makes the alignment process robust to the varying pixel intensity across channels at a fixed pixel coordinate.

    — Here's the before and after of the emir.tif image file. The Emir of Bukhara may now rest in peace!

    Emir - Before
    Emir - After

    — And here's the entire gallery of high-res images aligned using Canny Edge features.

    High Res Images

    (-4, 22) (-4, 34)
    Harvesters
    (102, 46) (48, 32)
    Icon
    (-44, 20) (-22, 18)
    Lady
    (-48, 74) (-46, 52)
    Self Portrait
    (-62, 78) (-44, 46)
    Emir of Bukhara
    (206, 16) (108, 28)
    Three Generations
    (128, 64) (64, 14)
    Train
    (-20, 56) (-14, 38)
    Turkmen
    (-4, 44) (-14, 18)
    Village

    2. Better Color Mapping

    In an attempt to produce more realistic colors in the aligned images, I first converted the image from BGR space to LAB space to control the perceptual lightness property independently. To be specific, I multiplyied it with a 0.95 scaling factor. I then transformed the image to the HSV space to control the Saturation and Value. Keeping the Hue unchanged, I scaled Saturation and Value by a factor of 1.4 and 1.1 respectively. And finally, of-course I converted back to the BGR space before saving the image. A few examples of resulting images are shown below.

    Emir - Before
    Emir - After
    Turkmen - Before
    Turkmen - After
    Church - Before
    Church - After

    3. Automatic Border Cropping

    The .jpg and .tiff files usually have a black border as well as a white border on the outside. Getting rid of these borders could make the alignment procedure smoother. Not just that, the resulting composite image would likely be more aesthetically pleasing to the viewer.

    A fixed strategy of shaving off 10% of the image from all sides probably works decent, but I wanted to preserve more of Gorskii's images, while minimizing border artifacts - therefore, I went with a slightly better adaptive approach.

    To start off, if a pixel has an intensity greater than 250 I considered it to be white, and an intensity less than 10 to be black (scaled by 255 since we're dealing with float images). From there, I find the fraction of pixels that are black / white in every row / column of the raw image. And finally, if this fraction exceeded 70%, I simply delete that specific row / column. The rationale behind the lower threshold is to be more robust to artifacts often seen on borders (like written numbers on the border in emir.tif).

    The images shown in Results, as mentioned before are a result of this automatic border cropping procedure. But in this bells and whistles section, I also add an additonal step to detect the edge artifacts (using three different thresholds in combination with sobel edge filters) resulting from the use of np.roll function with the help of a Sobel filter, and remove those as well. Here are three comparisons:

    Lilacs - Before
    Lilacs - After
    Train - Before
    Train - After
    Dagestani - Before
    Dagestani - After